Structural compression for document analysis

نویسندگان

  • Omid E. Kia
  • David S. Doermann
چکیده

In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an eecient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to speciied regions within the image and facilitates traditional document processing operations without requiring complete decoding. We describe an algorithm which symbolically decomposes a document image and structurally orders the error bitmap based on a probabilistic model. The resultant symbol and error representations lend themseleves to reasonably high compression ratios and are structured so as to allow operations directly on the compressed image. The compression scheme is implemented and compared to traditional compression methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proceedings of the International Conference on Pattern Recognition , volume C , pages 664 - 668 , 1996 Structural Compression for Document

In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an eecient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to speciied regions within the image and facilitates traditional document processing operations without requirin...

متن کامل

Document Image Compression and Analysis Your Full Name

Title of Dissertation: Your Dissertation Title Your Full Name, Doctor of Philosophy, 1997 Dissertation directed by: Academic title and name of advisor Department of Mathematics Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this ...

متن کامل

Group 4 Compressed Document Matching

Numerous approaches, including textual, structural and featural, for detecting duplicate documents have been investigated. Considering document images are usually stored and transmitted in compressed forms, it is advantageous to perform document matching directly on the compressed data. A two-stage process for matching Group 4 compressed document images is presented. In the coarse matching stag...

متن کامل

Proceedings of the International Conference on Image Processing , 1996 STRUCTURE - PRESERVING DOCUMENT IMAGE COMPRESSIONOmid

Maintaining a document in image form is often preferable in order to avoid the high cost of manual conversion or the introduction of large numbers of errors by automatic OCR and/or graphics interpretation. The large volume of data in the image can be greatly reduced by using compression techniques. Text-intensive document images typically have a great deal of redundancy in the bitmap representa...

متن کامل

Structure-preserving document image compression

Maintaining a document in image form is often preferable in order to avoid the high cost of manual conversion or the introduction of large numbers of errors by automatic OCR and/or graphics interpretation. The large volume of data in the image can be greatly reduced by using compression techniques. Text-intensive document images typically have a great deal of redundancy in the bitmap representa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996